According to American institute of human development, in one’s life time there are four significant life stages: Adolescence (Ages 12-20),Early Adulthood (Ages 20-35),Midlife (Ages 35-50) and Mature Adulthood (Ages 50-80). As people become older and move across life stages, they will show different characteristics. In this analysis, we use a corpus of 100,000 crowd-sourced happy moments and apply natural language processing to find out how people’s expression habits and happiness get changed as they move across life stages. The organization of this report is as follows. Part2 explores the distribution of sentence length for each age group,Part3 explores which terms having high frequecy to appear in people’s words about happiness. Part4 use both td-idf and Latent Dirichlet allocation methods to explore what are the differnces in happiness among different age groups.
Let’s first focus on the distribution of sentence’s length for each age group. As we can see from the box plots below, when aging, people generally write longer sentences. There is no particular studies about this finding. But based on a reporthttps://www.health.harvard.edu/mind-and-mood/how-memory-and-thinking-ability-change-with-age from Harward university, two reaons might be related to this phenomenon. First, people’s brain degenerates as aging, and as a result, people can’t think of precisely the word they are looking for. So, they use longer sentence to express themselves since they can’t think out some of the “precise” words when writing. Second,the branching of dendrites in people’s brain increases when aging, and connections between distant brain areas strengthen. This means people’s brain becomes better at seeing the entire forest and worse at seeing the leaves with age.Thus, older people might write a longer sentence since they tend to show the whole picture.
Second, let’s focus on the content of our data set. As we can see from the wordcloud below, the following terms frequently appear in people’s words about happiness:friend,family,played,daughter,son, watched,wife,games,etc. It seems like most of the significant words can be allocated into two groups: One is family group(e.g. son,daughter and other family memebrs), the other is personal life group(e.g.game,play,bought).
Next, let’s focus on what are the differences of happiness among different age groups. In this part, we apply tf-idf for analysis. tf-idf is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.It is the product of two statistics, term frequency and inverse document frequency. The four graphes below show the results of the tf-idf analysis. As we can see, for people below 20 years old, things that make them happy are related to the following words: marriage,bros,GPA,passed,ups(may be buy something). A big proportion of their happiness is related to their personal life. In contrast, for people above 50 years old, daughter, husband, son,wife, are the top 4 significant words for them. If we see these four graphes together, people care more and more about family when aging.
In this part, we apply another method called Latent Dirichlet allocation(LDA) for analysis. LDA is is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. Based on our previous analysis, we find that people’s happiness are generally realted to two things: their personal life and family memebrs. Thus,we set the number of topics equals to 2 for LDA analysis. The two graphes below show the top 10 terms that are most common within each topic. As we can see, graph 1(topic1) has a higher probabiliy of generating the word:family. It also has a higher probability of generating the word: son. Thus, topic1 may represent the topic about family. For graph 2(topic2),it has a higher probabily of generating the words: school, job,gam, which means topic 2 may represnt the topic about personal life. Based on these findins,in next part we will explore the weight of each topic in different agegroups.
The scatter plot below shows the weight of these two topics in each agegroup.As we can see, as the age increases, the points get closer and closer to topic2. This means the proportion of family in people’s happiness get increased when aging. Our findings in this part confirms the results from tf-idf.
Based on the website senior.com, the following reasons can be explained why people care more on family when aging. 1. Retirement is a major disruption to seniors’ social lives and it is particularly challenging for older adults to make friends. 2.Strong family relationships give seniors a stable and much-needed support system as age makes them increasingly vulnerable.
By analyzing the corpus of 100,000 crowd-sourced from Amazon, we get the following results in our analysis. 1. People generally write longer sentences as they get older.
2.Based on our corpus, we find if we do not distinguish by age groups,the following things are the main scources of happiness:Friend, Birthday, Game, School, Buy, love, talk, completed(something), movie and family members.